Serveur d'exploration H2N2

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Detection of independent associations in a large epidemiologic dataset: a comparison of random forests, boosted regression trees, conventional and penalized logistic regression for identifying independent factors associated with H1N1pdm influenza infections

Identifieur interne : 000109 ( France/Analysis ); précédent : 000108; suivant : 000110

Detection of independent associations in a large epidemiologic dataset: a comparison of random forests, boosted regression trees, conventional and penalized logistic regression for identifying independent factors associated with H1N1pdm influenza infections

Auteurs : Yohann Mansiaux [France] ; Fabrice Carrat [France]

Source :

RBID : Hal:inserm-01098222

Abstract

Background: Big data is steadily growing in epidemiology. We explored the performances of methods dedicated to big data analysis for detecting independent associations between exposures and a health outcome. Methods: We searched for associations between 303 covariates and influenza infection in 498 subjects (14% infected) sampled from a dedicated cohort. Independent associations were detected using two data mining methods, the Random Forests (RF) and the Boosted Regression Trees (BRT); the conventional logistic regression framework (Univariate Followed by Multivariate Logistic Regression -UFMLR) and the Least Absolute Shrinkage and Selection Operator (LASSO) with penalty in multivariate logistic regression to achieve a sparse selection of covariates. We developed permutations tests to assess the statistical significance of associations. We simulated 500 similar sized datasets to estimate the True (TPR) and False (FPR) Positive Rates associated with these methods. Results: Between 3 and 24 covariates (1%-8%) were identified as associated with influenza infection depending on the method. The pre-seasonal haemagglutination inhibition antibody titer was the unique covariate selected with all methods while 266 (87%) covariates were not selected by any method. At 5% nominal significance level, the TPR were 85% with RF, 80% with BRT, 26% to 49% with UFMLR, 71% to 78% with LASSO. Conversely, the FPR were 4% with RF and BRT, 9% to 2% with UFMLR, and 9% to 4% with LASSO. Conclusions: Data mining methods and LASSO should be considered as valuable methods to detect independent associations in large epidemiologic datasets.


Url:
DOI: 10.3201/eid1610.100516


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

Hal:inserm-01098222

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Detection of independent associations in a large epidemiologic dataset: a comparison of random forests, boosted regression trees, conventional and penalized logistic regression for identifying independent factors associated with H1N1pdm influenza infections</title>
<author>
<name sortKey="Mansiaux, Yohann" sort="Mansiaux, Yohann" uniqKey="Mansiaux Y" first="Yohann" last="Mansiaux">Yohann Mansiaux</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-267618" status="OLD">
<idno type="RNSR">201420917E</idno>
<orgName>Institut Pierre Louis d'Epidémiologie et de Santé Publique</orgName>
<orgName type="acronym">iPLESP</orgName>
<desc>
<address>
<addrLine>56, boulevard Vincent Auriol - CS 81393 - 75646 Paris Cedex 13</addrLine>
<country key="FR"></country>
</address>
<ref type="url">https://www.iplesp.upmc.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-93591" type="direct"></relation>
<relation name="U1136" active="#struct-303623" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-93591" type="direct">
<org type="institution" xml:id="struct-93591" status="OLD">
<orgName>Université Pierre et Marie Curie - Paris 6</orgName>
<orgName type="acronym">UPMC</orgName>
<date type="end">2017-12-31</date>
<desc>
<address>
<addrLine>4 place Jussieu - 75005 Paris</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.upmc.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle name="U1136" active="#struct-303623" type="direct">
<org type="institution" xml:id="struct-303623" status="VALID">
<idno type="IdRef">026388278</idno>
<orgName>Institut National de la Santé et de la Recherche Médicale</orgName>
<orgName type="acronym">INSERM</orgName>
<desc>
<address>
<addrLine>101, rue de Tolbiac, 75013 Paris </addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inserm.fr</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Carrat, Fabrice" sort="Carrat, Fabrice" uniqKey="Carrat F" first="Fabrice" last="Carrat">Fabrice Carrat</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-267618" status="OLD">
<idno type="RNSR">201420917E</idno>
<orgName>Institut Pierre Louis d'Epidémiologie et de Santé Publique</orgName>
<orgName type="acronym">iPLESP</orgName>
<desc>
<address>
<addrLine>56, boulevard Vincent Auriol - CS 81393 - 75646 Paris Cedex 13</addrLine>
<country key="FR"></country>
</address>
<ref type="url">https://www.iplesp.upmc.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-93591" type="direct"></relation>
<relation name="U1136" active="#struct-303623" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-93591" type="direct">
<org type="institution" xml:id="struct-93591" status="OLD">
<orgName>Université Pierre et Marie Curie - Paris 6</orgName>
<orgName type="acronym">UPMC</orgName>
<date type="end">2017-12-31</date>
<desc>
<address>
<addrLine>4 place Jussieu - 75005 Paris</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.upmc.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle name="U1136" active="#struct-303623" type="direct">
<org type="institution" xml:id="struct-303623" status="VALID">
<idno type="IdRef">026388278</idno>
<orgName>Institut National de la Santé et de la Recherche Médicale</orgName>
<orgName type="acronym">INSERM</orgName>
<desc>
<address>
<addrLine>101, rue de Tolbiac, 75013 Paris </addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inserm.fr</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:inserm-01098222</idno>
<idno type="halId">inserm-01098222</idno>
<idno type="halUri">https://www.hal.inserm.fr/inserm-01098222</idno>
<idno type="url">https://www.hal.inserm.fr/inserm-01098222</idno>
<idno type="doi">10.3201/eid1610.100516</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Hal/Corpus">000051</idno>
<idno type="wicri:Area/Hal/Curation">000051</idno>
<idno type="wicri:Area/Hal/Checkpoint">000150</idno>
<idno type="wicri:explorRef" wicri:stream="Hal" wicri:step="Checkpoint">000150</idno>
<idno type="wicri:doubleKey">1471-2288:2014:Mansiaux Y:detection:of:independent</idno>
<idno type="wicri:Area/Main/Merge">000645</idno>
<idno type="wicri:Area/Main/Curation">000644</idno>
<idno type="wicri:Area/Main/Exploration">000644</idno>
<idno type="wicri:Area/France/Extraction">000109</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Detection of independent associations in a large epidemiologic dataset: a comparison of random forests, boosted regression trees, conventional and penalized logistic regression for identifying independent factors associated with H1N1pdm influenza infections</title>
<author>
<name sortKey="Mansiaux, Yohann" sort="Mansiaux, Yohann" uniqKey="Mansiaux Y" first="Yohann" last="Mansiaux">Yohann Mansiaux</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-267618" status="OLD">
<idno type="RNSR">201420917E</idno>
<orgName>Institut Pierre Louis d'Epidémiologie et de Santé Publique</orgName>
<orgName type="acronym">iPLESP</orgName>
<desc>
<address>
<addrLine>56, boulevard Vincent Auriol - CS 81393 - 75646 Paris Cedex 13</addrLine>
<country key="FR"></country>
</address>
<ref type="url">https://www.iplesp.upmc.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-93591" type="direct"></relation>
<relation name="U1136" active="#struct-303623" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-93591" type="direct">
<org type="institution" xml:id="struct-93591" status="OLD">
<orgName>Université Pierre et Marie Curie - Paris 6</orgName>
<orgName type="acronym">UPMC</orgName>
<date type="end">2017-12-31</date>
<desc>
<address>
<addrLine>4 place Jussieu - 75005 Paris</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.upmc.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle name="U1136" active="#struct-303623" type="direct">
<org type="institution" xml:id="struct-303623" status="VALID">
<idno type="IdRef">026388278</idno>
<orgName>Institut National de la Santé et de la Recherche Médicale</orgName>
<orgName type="acronym">INSERM</orgName>
<desc>
<address>
<addrLine>101, rue de Tolbiac, 75013 Paris </addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inserm.fr</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Carrat, Fabrice" sort="Carrat, Fabrice" uniqKey="Carrat F" first="Fabrice" last="Carrat">Fabrice Carrat</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-267618" status="OLD">
<idno type="RNSR">201420917E</idno>
<orgName>Institut Pierre Louis d'Epidémiologie et de Santé Publique</orgName>
<orgName type="acronym">iPLESP</orgName>
<desc>
<address>
<addrLine>56, boulevard Vincent Auriol - CS 81393 - 75646 Paris Cedex 13</addrLine>
<country key="FR"></country>
</address>
<ref type="url">https://www.iplesp.upmc.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-93591" type="direct"></relation>
<relation name="U1136" active="#struct-303623" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-93591" type="direct">
<org type="institution" xml:id="struct-93591" status="OLD">
<orgName>Université Pierre et Marie Curie - Paris 6</orgName>
<orgName type="acronym">UPMC</orgName>
<date type="end">2017-12-31</date>
<desc>
<address>
<addrLine>4 place Jussieu - 75005 Paris</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.upmc.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle name="U1136" active="#struct-303623" type="direct">
<org type="institution" xml:id="struct-303623" status="VALID">
<idno type="IdRef">026388278</idno>
<orgName>Institut National de la Santé et de la Recherche Médicale</orgName>
<orgName type="acronym">INSERM</orgName>
<desc>
<address>
<addrLine>101, rue de Tolbiac, 75013 Paris </addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inserm.fr</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
</analytic>
<idno type="DOI">10.3201/eid1610.100516</idno>
<series>
<title level="j">BMC Medical Research Methodology</title>
<idno type="ISSN">1471-2288</idno>
<imprint>
<date type="datePub">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Background: Big data is steadily growing in epidemiology. We explored the performances of methods dedicated to big data analysis for detecting independent associations between exposures and a health outcome. Methods: We searched for associations between 303 covariates and influenza infection in 498 subjects (14% infected) sampled from a dedicated cohort. Independent associations were detected using two data mining methods, the Random Forests (RF) and the Boosted Regression Trees (BRT); the conventional logistic regression framework (Univariate Followed by Multivariate Logistic Regression -UFMLR) and the Least Absolute Shrinkage and Selection Operator (LASSO) with penalty in multivariate logistic regression to achieve a sparse selection of covariates. We developed permutations tests to assess the statistical significance of associations. We simulated 500 similar sized datasets to estimate the True (TPR) and False (FPR) Positive Rates associated with these methods. Results: Between 3 and 24 covariates (1%-8%) were identified as associated with influenza infection depending on the method. The pre-seasonal haemagglutination inhibition antibody titer was the unique covariate selected with all methods while 266 (87%) covariates were not selected by any method. At 5% nominal significance level, the TPR were 85% with RF, 80% with BRT, 26% to 49% with UFMLR, 71% to 78% with LASSO. Conversely, the FPR were 4% with RF and BRT, 9% to 2% with UFMLR, and 9% to 4% with LASSO. Conclusions: Data mining methods and LASSO should be considered as valuable methods to detect independent associations in large epidemiologic datasets.</p>
</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
</country>
</list>
<tree>
<country name="France">
<noRegion>
<name sortKey="Mansiaux, Yohann" sort="Mansiaux, Yohann" uniqKey="Mansiaux Y" first="Yohann" last="Mansiaux">Yohann Mansiaux</name>
</noRegion>
<name sortKey="Carrat, Fabrice" sort="Carrat, Fabrice" uniqKey="Carrat F" first="Fabrice" last="Carrat">Fabrice Carrat</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/H2N2V1/Data/France/Analysis
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000109 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/France/Analysis/biblio.hfd -nk 000109 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    H2N2V1
   |flux=    France
   |étape=   Analysis
   |type=    RBID
   |clé=     Hal:inserm-01098222
   |texte=   Detection of independent associations in a large epidemiologic dataset: a comparison of random forests, boosted regression trees, conventional and penalized logistic regression for identifying independent factors associated with H1N1pdm influenza infections
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Apr 14 19:59:40 2020. Site generation: Thu Mar 25 15:38:26 2021